-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the new RNTuple format #395
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, thanks for this. My questions/comments are either minor things, or some design questions. I guess the most important one is, whether it would make sense to introduce similar helper structs as for the TTree versions to keep category information together and remove a few of the maps that all have the same keys at the moment (as far as I understood).
Another one, would be the repetition around the GenericParameter reading/writing, where we should be able to play some template games to drastically reduce it (and additionally make it almost automatic if there should ever by a new type added to them).
@@ -25,6 +25,7 @@ using VectorMembersInfo = std::vector<std::pair<std::string, void*>>; | |||
*/ | |||
struct CollectionWriteBuffers { | |||
void* data{nullptr}; | |||
void* vecPtr{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely happy with this one. But at the moment we seem to not have any other option(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, it's in my todo list
I have implemented almost all the comments. The simplification for writing and reading the generic parameters adds a couple of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Mainly a few questions for my understanding and minor variable/function name nitpicking for better readability.
Making the second getMap<T>
public for the GenericParameters is not really an issue, since they have effectively been public before as well.
src/ROOTNTupleReader.cc
Outdated
auto keys = keyView(entNum); | ||
auto values = valueView(entNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding. These are really just views to the data, right? There is no way we can "steal" them without destroying things. I.e. we cannot simply read this things once and then move them into the GenericParameter
maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Do you mean reading them at once for all entNum
and then moving? I don't think that would change anything, values[i]
here is already a vector and we are writing in a map<string, vector<T>>
so we need to do it one by one anyway. But one thing that can be done is to cast to an rvalue
so that values
can be moved in the next line:
for (size_t i = 0; i < keys.size(); ++i) {
params.getMap<T>()[keys[i]] = std::move(values[i]);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my thinking was that we could provide some functionality for GenericParameters
that simply takes the keys vector<string>&&
and the values vector<vector<T>>&&
and then we move them into that function and inside that function we would move the individual elements into the internal map. But maybe something like:
params.getMap<T>().emplace(std::move(keys[i]), std::move(values[i]));
would be good enough. In this way we would also move the keys instead of copying them. Anyhow, all of this only works if we don't accidentally steal the things from the RNTuple view. I am not sure about that, maybe we have to make copies in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and the values are being moved inside the function because the vectors are left empty but if you call GetView
again then you get the values again which means that at least one copy is being made (I assume it's not reading twice). So I'm actually not sure the moves will do anything. But I guess there is no downside to std::move
. I guess we are only changing when the copies are made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suppose the GetView
will have to do some reading and potentially uncompressing. Maybe it skips this if the thing it views is non-empty. In any case, since we call GetView
only once per event, and we don't break things by move
-ing, I agree that we keep std::move
here and at least avoid unnecessary copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is done now moving both keys and values, I also removed the intermediate variables since they are not really used more times than that one
src/ROOTNTupleReader.cc
Outdated
ROOTFrameData::BufferMap buffers; | ||
auto dentry = m_readers[category][0]->GetModel()->GetDefaultEntry(); | ||
|
||
std::map<std::pair<std::string, int>, std::vector<podio::ObjectID>*> tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly here: This map of vector<ObjectID>*
is necessary, because we cannot directly "connect" the buffers.references[i].get()
to the vector*
that is expected in CaptureValueUnsafe
?
If that is the case, could you put a small comment here stating why we need this map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to buffers.references[i]
being a unique_ptr
I think. I'll double check but I think passing either the buffers.references[i]
or the buffers.references[i].get()
both crash and that's why I had to choose that not very elegant solution since I don't know better 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, pity. Would have been nice if the get()
version would have worked. But if both crash, then this is the way it has to be for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it even was commented out above. I also found another way of doing this that works which consisted on starting with making an unique_ptr
:
refCollections->at(j) = std::make_unique<std::vector<podio::ObjectID>>();
and then reading (on a new
vector so it's not deleted) and then making the unique_ptr
point to it (relying on 0-sized unique_ptr
s) but I think I settled for the current way because it's less confusing; it's probably there in the history.
Ok now this should be properly formatted |
Updated after #394 |
Writer updated to use a bare model from the comments from Jakob in e7e7e7e. It isn't that elegant since not everything was being written through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. Does this mean we can drop one of the void*
in the CollectionBuffers? Or do we have to keep them around for now? If there is no easy way around it, then I would probably be OK with merging this for now and properly documenting that the buffers are meant for internal use only and that they are subject to changes.
I once more also have a question about some RNTuple details that I missed before.
src/ROOTNTupleWriter.cc
Outdated
const auto typeName = "vector<" + type + ">"; | ||
const auto brName = root_utils::vecBranch(name, i++); | ||
auto ptr = *(std::vector<int>**)vec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto typeName = "vector<" + type + ">"; | |
const auto brName = root_utils::vecBranch(name, i++); | |
auto ptr = *(std::vector<int>**)vec; | |
const auto brName = root_utils::vecBranch(name, i++); | |
auto ptr = *(std::vector<int>**)vec; |
The typeName
seems to be unused, so I suppose this works because the necessary type information has been passed to the whole thing when we create the fields? And the vector<int>
cast works for all types that potentially come along here and is simply necessary for getting to the correct void*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't think there is a typename for the other members, must have forgotten to remove it when copying and pasting from the TTree reader. I should double check that it works for every type, but in principle it's the same three star cast used in other places:
podio/include/podio/CollectionBuffers.h
Line 41 in 930e600
return *static_cast<std::vector<T>**>(raw); |
No way of getting rid of the extra pointer that I know of, the last changes were to change how the RNTuple model was created, now it is a bare one which doesn't do instantiation (before it was a default one which does). This is unrellated to the |
@jmcarcell - can you update your PR ? would like to see whether we can merge today |
clang-tidy seems to break, because the root version in that LCG release is not yet recent enough: https://github.com/AIDASoft/podio/actions/runs/5386875055/jobs/9777453794?pr=395#step:4:416 I think now that we have LLVM in the key4hep stack we could also switch to that for the CI. However, we should do that in a separate PR, after we have merged this. There are also some clang-format complaints. I suppose these are due to different versions running on your machine and the CI(?). Can you fix those? That would make it easier to have one global commit for when we switch to a newer version of clang-format. |
Now |
@jmcarcell - can you spell out the branch name issues we discussed yesterday? Thanks! |
Updated now to conform to #405, which was the last thing pending, at least that I remember. The reader needs some cleanup of these changes but I think that will have to wait for tomorrow 😪
writing and reading the same frame:
|
@jmcarcell - thanks a lot! Do you think that's it for now? If yes, I will redo the review |
Yes I don't think I have anything else (other |
@jmcarcell - thanks a lot. I think it is fine as it is now. We will do the cleanups in another PR then :-) |
* Add a RNTuple writer * Cleanup and add a reader * Add compilation instructions for RNTuple * Add tests * Fix the reader and writer so that they pass most of the tests * Commit missing changes in the header * Add support for Generic Parameters * Add an ugly workaround to the unique_ptr issue * Read also vector members and remove some comments * Do a bit of cleanup * Do more cleanup, also compiler warnings * Add names in rootUtils.h, fix a few compiler warnings * Add a few minor changes * Add missing changes in the headers * Change map -> unordered_map and use append in CMakeLists.txt * Simplify writing and reading of generic parameters * Only create the ID table once * Add CollectionInfo structs * Add a ROOT version check * Add missing endif() * Add Name at the end of some names * Add missing Name at the end * Cast to rvalue * Cache entries and reserve * Add comment and remove old comments * Remove a few couts * Remove intermediate variables and use std::move * Run clang-format * Use clang-format on tests too * Enable RNTuple I/O in Key4hep CI * Check if dev3 workflows come with recent enough ROOT * Change MakeField to the new signature * Update the RNTuple reader and writer to use the buffer factory * Run clang-format * Update the RNTuple writer to use a bare model * Add friends for Generic Parameters * Update changes after the changes in the collectionID and string_view * Run clang-format * Update the reader and writer to conform to AIDASoft#405 * Reorganize and clean up code in the reader * Run clang-format * Simplify how the references are filled --------- Co-authored-by: jmcarcell <[email protected]> Co-authored-by: tmadlener <[email protected]>
Add a writer and a reader using the podio Frame format. All tests pass and it doesn't break the other writes but the cost for that is a couple of ugly workarounds (see #393 for some of the issues that were found while doing this). My idea is that this is merged first and then we worry about the workarounds if fixes have not been found in time, otherwise this PR will get quite complicated.
For running with this a recent version of ROOT is needed, 6.28.00 or newer (I'm not sure 100% if it works with 6.28.00 since I have been running off master). There is a new option in the CMakeLists to enable support for this by asking cmake to find the relevant ROOT library. When this option is enabled (
-DENABLE_RNTUPLE
), the writer reader and tests will be compiled, otherwise nothing should have changed.The writer and reader have a very similar interface to the current TTree-based frame ones. The tests are the same for both. How the writer and reader work is a bit different, of course, from the TTree-based. For example, RNTuple doesn't support mapped types right now so for the generic parameters there is a 'disentangling' from the map types to simple vector types that are saved in the rntuple and then reconstructed in the reader (from vector to map types). I tried to make it as simple as possible by not using extra types and trying to pull few headers from podio.
It's still not fully ready for merging as cleanups need to be done and more comments are needed in several places. Also the support for reading from multiple files is not complete right now. It is a bit more tricky than the TTree case since there is no equivalent to a TChain as far as I know.
BEGINRELEASENOTES
ENDRELEASENOTES
Supersedes #247